Nov 29, 2017

About Me: Hui Lin

HTML5 Icon

What is data science?

HTML5 Icon

What is data science?

  • Does big data matter?
  • Back to 1962, John Tukey wrote in “The Future of Data Analysis”:

For a long time I have thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and to doubt. … All in all, I have come to feel that my central interest is in data analysis, which I take to include, among other things: procedures for analyzing data, techniques for interpreting the results of such procedures, ways of planning the gathering of data to make its analysis easier, more precise or more accurate, and all the machinery and results of (mathematical) statistics which apply to analyzing data.

What is data science?

  • The web site for DSI gives us an idea what Data Science is:

“This coupling of scientific discovery and practice involves the collection, management, processing, analysis, visualization, and interpretation of vast amounts of heterogeneous data associated with a diverse array of scientific, translational, and interdisciplinary applications.”

What is data scientist?

What is data scientist?

Here is a list of definitions for a “data scientist”:

  • “A data scientist is a data analyst who lives in California”
  • “A data scientist is someone who is better at statistics than any software engineer and better - at software engineering than any statistician.”
  • “A data scientist is a statistician who lives in San Francisco.”
  • “Data Science is statistics on a Mac.”

%&^%$*(^)…..

What is “hard-core pornography”?

  • Obscenity case of Jacobellis v. Ohio (1964)

“I know it when I see it. (Potter Stewart)”

Brief History

Driving Forces

  • John Tukey identified 4 forces driving data analysis (there was no “data science” then):
  1. The formal theories of math/stat
  2. Acceleration developments in computers and display devices
  3. The challenge, in many fields, of more and ever larger bodies of data
  4. The emphasis on quantification in an ever wider variety of disciplines

What questions can data science answer?

  • Specific
    1. How can we increase sales?
    2. Dose the January campaign on product X increase the amount of purcahse from our 2017 retained customers?
  • Data
    1. Representative
    2. Relevant
    3. Quality

Types of Questions

Types of Learning

Types of Algorithm

Data Scientist Skill Set

General Process

Automatic Data Science Pipeline

Some links